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Abstract — In this paper, short-term throughput optimal power 
allocation policies are derived for an energy harvesting transmit- 
ter with energy storage losses. In particular, the energy harvesting 
transmitter is equipped with a battery that loses a fraction 
of its stored energy. Both single user, i.e. one transmitter-one 
receiver, and the broadcast channel, i.e., one transmitter-multiple 
receiver settings are considered, initially with an infinite capacity 
battery. It is shown that the optimal policies for these models are 
threshold policies. Specifically, storing energy when harvested 
power is above an upper threshold, retrieving energy when 
harvested power is below a lower threshold, and transmitting 
with the harvested energy in between is shown to maximize the 
weighted sum-rate. It is observed that the two thresholds are 
related through the storage efficiency of the battery, and are non- 
decreasing during the transmission. The results are then extended 
to the case with finite battery capacity, where it is shown that a 
similar double-threshold structure arises but the thresholds are 
no longer monotonic. A dynamic program that yields an optimal 
online power allocation is derived, and is shown to have a similar 
double-threshold structure. A simpler online policy is proposed 
and observed to perform close to the optimal policy. 

Index Terms — Energy harvesting, inefficient energy storage, 
optimal scheduling, wireless networks. 



I. Introduction 

Desirable aspects of future wireless applications include 
longer lifetime, smaller physical size, energy independence 
and a low carbon footprint. Energy harvesting wireless net- 
works play an important role towards providing these, allowing 
mobile devices to operate for an indefinite amount of time. 
On the other hand, networks comprising of energy harvesting 
nodes have their own design challenges, most prominently, 
efficient use of intermittent energy taking into consideration 
the availability and storage of harvested energy. 

Energy harvesting wireless networks have begun to receive 
attention from the wireless communication community. Efforts 
in the past few years have considered such networks and 
identified optimal policies to govern the scarce and varying en- 
ergy resource. The transmission completion time minimization 
problem for an energy harvesting hansmitter is considered in 
12 with discrete energy arrivals known in an offline manner, 
and the policy satisfying the energy constraints is shown to 
have a non-decreasing piecewise constant structure. In Q, an 
energy harvesting node with limited energy storage is studied 



for throughput maximization, and a similar piecewise constant 
policy is found to be optimal. It is also shown in reference 
J5) that the problems of time minimization and throughput 
maximization are closely related. This study is extended to a 
single-link fading channel in J4), where a directional water- 
filling algorithm with varying water levels is proposed to 
combine the energy constraints with the conventional water- 
filling results for fading channels. A similar water-filling result 
is obtained by J5j with an information theoretic approach. 
Multiple user settings for offline problems have been studied 
in J6| for broadcast channels, Q for multiple access channels 
and |8| for interference channels using variations of the 
directional water-filling algorithm. A two-hop setting with 
energy harvesting transmitter and relay in a static channel has 
been considered in (9). All of these aforementioned efforts 
assume the presence of an energy storage unit on the node 
which is able to store the harvested energy without loss. 

An energy storage device proves to be useful in designing 
more flexible power policies by providing a buffer for the 
harvested energy. Essentially, this helps prolong the operation 
of the node since the stored energy can be used whenever 
the node is needed on the network. However, the said storage 
device in reality would have non-ideal characteristics, such as 
capacitjQ fading, energy-expenditure-rate dependent capacity, 
leakage, and recovery effects. As a consequence, it is necessary 
to consider these imperfections to develop a more realistic 
model for wireless nodes and find power policies tailored 
to them. Various models have been proposed to predict the 
behavior of energy storage devices such as chemical batteries 
iflOl . ifTTl . Energy harvesting nodes utilizing batteries with 
capacity fading or battery leakage were studied in lfl2l by 
revising the approaches of |2], G). Storage inefficiency was 
modeled in lfl3ll as a constant loss rate per stored energy 
to find asymptotically optimal policies for sufficiently large 
batteries with energy neutrality constraints. Reference fl4l 
studied duty-cycling with constant transmission rate under 
energy neutrality conditions. 

This paper focuses on single transmitter communication 
settings where the transmitter is energy harvesting, and a 



'Here we refer to energy storage capacity of said device. The data 
transmission related metrics are referred to as short-term throughput or rate. 
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fraction of the (harvested) energy is wasted due to imper- 
fections in storing in as well as discharging energy from 
the energy storage device on board, such as a batterjQ or 
supercapacitor. First, optimal offline policies maximizing the 
average rate are found for a single transmitter-receiver link, 
and are also shown to solve the broadcast channel, with 
a sufficiently large battery capacity, in Sections [HI] and [IV] 
respectively. It is shown that contrary to the results of pre- 
vious work with ideal batteries [|2)-||4), (6), Q, where the 
optimal policy is shown to be piece-wise constant, here, the 
optimal policy for a battery with storage losses may favor 
transmitting with the just harvested energy without storing it. 
In particular, the optimal policy is shown to have a double- 
threshold structure with non-decreasing thresholds, in which 
the transmit power equals the harvested power whenever the 
harvested power falls between the thresholds. Next, the results 
are extended to the case when the storage device is a finite 
capacity battery in Section [V] For this case, it is shown that 
the double threshold policy applies, while the thresholds are 
instead determined as piecewise constant and decreasing or 
increasing at full and empty battery instances, resembling the 
power policy in [3 1. Building on the intuition from the optimal 
offline policy, a dynamic program to find the optimal online 
policy is presented and a simpler near-optimal policy with 
constant threshold levels maintaining a stable energy buffer is 
proposed in Section[VT] The performances of these policies are 
simulated and compared to the offline policy in Section IVIII 

II. System Model 

We consider an energy harvesting transmitter that employs 
transmission power control to regulate the achieved rate or 
utility. The node is free to choose how much of the harvested 
energy will be utilized for transmission, storing the remaining 
portion in the on-board battery. The instantaneous transmission 
power is drawn from the energy being harvested at that instant, 
the energy previously stored in the battery, or both, depending 
on the transmission policy. 

The instantaneous utility r(p) is defined as the utility when 
the node transmits with power p. The utility of the system 
is then defined as the integral of the achieved instantaneous 
utility over the duration of operation, T. For the single link, 
r(p) can be the achieved instantaneous rate, with the system 
utility translating to the total number of bits communicated 
to the receiver. For the broadcast setting, r(p) can be any 
weighted sum of the number of bits delivered to all the 
receivers. 

The node harvests energy at a non-negative rate h(t) to 
either be used in transmission directly or to be stored in 
the battery. However, due to the inefficiency of the battery, 
a fraction of the stored power is lost. This energy loss model 
has been used before, see for example, fl3l . Similarly, a loss 
may occur when power is drawn from the battery. These two 
losses are combined in the model and the fraction of energy 
that can be drawn from the battery per unit energy stored, i.e., 
battery efficiency, is represented by 77, < 77 < 1. The model 
for the single user setting is depicted in Figure Q] 

2 From this point on we use battery and energy storage device interchange- 
ably. 
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Fig. 1 . Energy harvesting transmitter with inefficient storage and finite battery 
capacity in a single link. 



Denoting the rate of energy storage by s(t) and the power 
drawn from the battery by u(t), the energy stored in the battery 
at time t is given by 

Ebatty) = f V s{t)-u{t) dr (1) 
Jo 

and the power the transmitter receives, which is the remaining 
harvested power and the power drawn from the battery, is 
expressed as 



p(t) = h(t) - s(t)+u(t). 



(2) 



Note that the power being stored cannot exceed the har- 
vested power, i.e., s(i) < h(t), and both s(t) and u(t) are 
non-negative by definition. 

It is worthwhile to note that s(t) and u(t) should not be 
nonzero at the same time, since storing and drawing energy 
simultaneously may not be physically possible |15|, and also 
that it yields to an energy loss without any storage benefit. 
We will not impose such a constraint on the problem, but 
it will become evident that the optimal policy satisfies this 
constraint. Secondly, the energy drawn from the battery cannot 
exceed the energy stored in the battery at any time throughout 
the transmission. This constraint, which we will refer to as 
energy causality, is given for a total transmission duration of 
T by the set of constraints 



Ebat(t) 



T]s(t) — u(t) dr > 0, 



< t < T. (3) 



In Section [Vj we shall consider nodes with a finite storage 
capacity of E max . A finite capacity implies that any power 
attempted to be stored in the battery when E^at = E max 
will be lost. This loss on the battery side can be avoided by 
directing the excess power to the transmitter, yielding battery 
capacity constraints 

E ba t{t) = I t]s{t) - u{t) dr < E max , < t < T 

Jo 

(4) 

that prevent battery overflow. Note that this constraint does 
not allow the storage system to discard any energy. However, 
in the unlikely case that it is optimal to do so, the discard 
decision is left to the transmitter through the design of the 
utility function r(p). 

The problem investigated in this paper is maximizing the 
average utility of the system within a deadline T through a 
transmission power adapted to the harvesting process and the 
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inefficient storage, for a single energy harvesting transmitter 
and single receiver (Section ITITb . and multiple receiver (Sec- 
tion |IVt settings. We shall solve this problem in an offline 
setting, i.e., the energy harvests known to the transmitter. 
Using the insights obtained from the optimal offline policies, 
we shall propose online policies in Section [VT] 

III. Optimal Transmission Policy for a Single Link 

The average utility maximization problem for the model in 
Figure Q] with infinite battery capacity is defined as 

1 f T 

max — / r(h(t) - s(t) + u(t)) dt (5a) 
s (t),«(t) T J 

s.t. < / V S ( T ) — U ( T ) dr, 
Jo 

h(t) > s(t) > 0, u(t) > 0, < t < T (5b) 

where ?/ is the efficiency of the battery for the energy har- 
vesting transmitter. We first observe that the problem in (0 is 
a convex optimization problem. The constraints on s(t) and 
u(t) in ( |5bl are linear, and thus the feasible set is convex. To 
show the convexity of the problem as a whole, it remains to 
show that the objective function in (Bab is concave in these 
variables. First, we state the concavity of r(p) in p through a 
time-sharing argument similar to that in (8): 

Lemma 1: The maximum achievable instantaneous rate 
r(p) for a given power p is non-decreasing, continuous and 
concave in p. 

Proof: The non-decreasing property emerges from the 
ability of the transmitter to discard a fraction of the provided 
transmission power p and achieve the rate provided by any 
p' < p. Continuity follows from the following observation: 
given a power p 1 < p arbitrarily close to p, the transmitter 
can achieve a rate arbitrarily close to r(p) by transmitting 
with r(p) for a slightly shorter time period and turning off for 
a sufficient time so that an average transmit power of p' is 
achieved. 

The proof for concavity is by contradiction. For any pi, 
P2 and A, assume the concavity is violated at p\ = \pi + 
(1 — X)p-2- Then it is easy to see that a rate better than r{p\) 
can be achieved by time-sharing between r(pi) and r{p2) 
with a sharing parameter A. Consequently, r(p) cannot be the 
maximum achievable instantaneous rate. ■ 

Essentially Lemma Q] suggests that the desired properties 
all follow from an efficient use of the available instantaneous 
power. If it is possible to achieve a better rate with less 
power, some energy can be discarded. If it is more efficient to 
allow the node to sleep and wake-up, this can be considered 
within the instantaneous rate function by performing the 
corresponding sleep policy when needed. Similarly, if time- 
sharing between two or more power levels with an average 
power of p achieves a better rate, the node would adopt this 
time-sharing policy whenever supplied with a power of p. 
Benefits these simple policies might bring are included in 
the power-rate function, which also renders this function non- 
decreasing, continuous and concave. 

Corollary 1: Since p(t) = h(t) — s(t) + u(t) is a linear 
function of s(t) and u(t), the objective function in d5at is 
continuous and jointly concave in the variables of the problem. 



The concavity of the objective function of the maximization 
and the convexity of the constraint set implies that (0 is a 
convex program. The Lagrangian corresponding to (0 is given 
in (0, where X(t), fi(t), a{t) and v(t) are the nonnegative 
Lagrangian multipliers corresponding to the energy causality, 
s(t) > h(t), and nonnegativity constraints for s(t) and u{t) 
respectively. The optimal energy storage and use policy must 
satisfy the Karush-Kuhn-Tucker (KKT) stationarity conditions 
|[T6l found by taking the derivative with respect to both 
variables at time < t < T as 

r'(h(t) - s(t) + u(t j) -r] f A(r) dr + fi(t) - a(t) = 

(7) 

-r'(h(t)-s(t) + u(t)) + J A(t) dr - v(t) = (8) 

for < t < T where r'(p) represents the derivative of r(p) 
with respect to p. The corresponding complementary slackness 
conditions for each Lagrangian multiplier are 

AO) (J T]s(t) - u{t) dr^j = (9a) 

/i(t) (h(t) - s(t)) = (9b) 
<r(t)s(t) = 0, v{t)u{t) =0 < t < T. (9c) 

In order to find the optimal policy, we test the KKT 
conditions above for five mutually exclusive modes of the 
transmitter that include all possible choices of s(t) and u(t). 
In cases where r{p) is not strictly concave, such as when time- 
sharing is employed, the solution of this problem is not unique. 
To develop an algorithm with a unique output and to simplify 
the analysis, we restrict our search set by omitting the modes 
which are strictly suboptimal or can be replaced with another 
mode without loss of optimality. 

Case 1: Simultaneous charge and discharge 
In this case, the battery is being charged and discharged 
at the same time t, i.e., s(t) > and u(t) > 0. Due to the 
complementary slackness conditions in d9c1) . this implies that 
both <r(t) = and v(t) = 0. Substituting these in © and © 
and adding the two equations, we get 

A(r) dT + fx(t)=0. (10) 

Due to fi(t) and X(t) being non-negative, and < r\ < 1, the 
conditions for ( TTOb to hold are 

fi(t) = (11) 

A ( T ) dT = - ( 12 > 

(fT2l implies that for the transmitter to be in this mode, either 
the efficiency r\ needs to be 1, or A(r) = for all t < r < T. 
In the former case, the storage is lossless, and a simultaneous 
charge and discharge is equivalent to only charging or only 
discharging with min(s(i), u(t)) forwarded directly from har- 
vested power to the transmitter. The latter case, substituted 
in © gives r'(p(r)) = for all t < t < T with u(t) = 0, 
meaning that the rate is invariant to transmission power after t 
whenever the stored energy is used. Thus, the expended power 
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C= [ r(h(t)-s(t)+u(t)) dt+ [ (\(t) [ 7 ? s(r) 
Jo Jo V Jo 



is useless. This energy can equivalently be lost by increasing 
transmission power without getting more rate. Therefore, for 
both of the cases when simultaneous charge and discharge 
appears optimal, there exists an equally good policy that avoids 
this mode. Consequently, we can safely assume that this mode 
is never used by the transmitter. 
Case 2: Discharging only 

Since the simultaneous charge and discharge is considered 
in Case 1, this case strictly refers to u(t) > and s(t) = 0. In 
this case, the complementary slackness condition in d9cT > gives 
v(t) = 0, and substituting in (0 yields 

r'(p(t)) = J A(r) dr. (13) 

Note that due to (l9al l. X(t) is nonzero only when the battery 
is empty. Hence, the value of r'(p(t)) in the discharging mode 
remains constant unless the battery is depleted. Since there are 
no other restrictions on p(t) for optimality, one solution is to 
choose a constant transmission power p = p u as the smallest 
power satisfying $13[ , which we shall adopt for simplicity 
and ease of implementation in this paper. The reason behind 
choosing the smallest such value will become clear in Case 5. 

Case 3: Charging only with s(t) < h(t) 

In Case 3 and 4, we consider the charging mode in two 
parts with s(i) < h(t) and s(i) = h(t) respectively. For the 
case with < s(t) < h(t) and u{t) = 0, i.e., the harvest rate 
is strictly larger than the storage rate and thus transmission 
power is nonzero, complementary slackness conditions d9bl 
and d9cT > dictate that a{t) = and = 0. Substituting in 
0, we get 

r'(p(t))=nj A(r) dr. (14) 

Noticing the similarity of the above equation to dot , we 
observe that r'(p(t)) in the charging mode also remains 
constant while the battery is not depleted, and introduce a 
similar restriction to the solution by choosing the largest 
constant transmission power p = p s satisfying (fT~4T > in this 
mode. We also note that the transmission power in discharge 
mode p u and the transmission power in the charge mode p s 
are related through 

^ = r, (15) 

by ( fT3l and (11411 . Therefore, we can conclude that there exists 
an optimal policy which charges and discharges only while 
maintaining constant transmission powers p s and p u , and that 
these two powers are related by H5i . 

Case 4: Charging only with s(t) = h{t) 

In this case, we consider storing all harvested power, i.e., 
s(t) = h(t), and thus having no transmit power. Since the 
constraint h(t) > s(t) is met with equality in this case, the 



u(t) dr + fJ,(t)(h(t) - s(tj) + a(t)s(t) + v(t)u(t) dt (6) 



corresponding Lagrangian multiplier no longer has to be 
zero as in Case 3. Thus, (|7]i becomes 

r'(p(t)) = r'(0) = 7? J A(r) dr - p(t) (16) 

where p(t) = for this case by definition. Comparing to (TBI . 
r'(0) is not greater than r'(p s ) since p(t) > 0. Lemma Q] 
shows that r(p) is non-decreasing continuous and concave, 
implying that > p s which is only feasible when p s = 0. 
Therefore, this mode is only optimal when the transmission 
power for charging, defined as p s in the previous case, is zero. 
As a result, these two modes can be considered jointly as the 
charging mode, with the transmission power equal to p s found 
from (fT4l i. 

Case 5: No charging or discharging 

This is the case where the node forwards all harvester power 
to the transmitter, i.e., p(t) = h(t) with s(t) = u(t) = 0, 
which we shall refer to as the passive storage mode. We 
assume that h(t) > to avoid the trivial case of h(t) = s(t) = 
u(t) = p(t) = 0. In this case, substituting /j,(t) = in © and 
(O gives 

r'{p{t)) ^rjj A(r) dr + a(t) = r'{p s ) + a(t) < r'(p s ) 

(17a) 

r'(p(t)) - / A(r) dr + v{t) = r'(p u ) - v(t) > r'( Pu ) 

(17b) 

implying that the transmission power p(t) is restricted to be 
within the interval [p u ,p s ]. Notice that this is in part due to 
the selection of p u and p s as the smallest and largest power 
values satisfying ( fT3l and (Tufl i respectively. 

The analysis of Cases 1 to 5 imply that there exists an 
optimal policy with the following three modes: 

1) Charging only mode with p(t) = p s such that (TPfl i is 
satisfied, 

2) Discharging only mode with p(t) = p u such that ( fT3l is 
satisfied, and 

3) Passive storage mode with p u < p(t) < p s . 

It is straightforward to see that the above spells a double 
threshold policy on h(t). When h(t) > p s , transmission power 
is chosen as p s and the excess energy is stored in the battery, 
referring to the first mode. Conversely when h(t) < p u , 
transmission power is kept at p u with the missing energy 
supplied from the battery, referring to the second mode. In 
between the two thresholds, i.e., when p u < h(t) < p s , the 
node transmits with the harvested power without utilizing the 
energy storage by any means, referring to the third mode. This 
policy gives a unique power allocation, satisfying all KKT 
conditions and the assumptions given in above cases, so that 
it performs at least as good as any other policy satisfying the 
necessary conditions. Thus the resulting policy is optimum. 
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At this point, what remains to obtain an optimal power 
allocation scheme is to determine the values of the thresholds, 
i.e., p„(i) and p s (t), throughout the transmission period [0, T]. 
Recall that these thresholds are defined as an integral of the 
Lagrangian variable X(t) as in (13[ and ( TT4b respectively. 
Remembering that r'(p) is also non-increasing in p, it can 
be stated that these thresholds are non-decreasing in t due 
to A(t) > 0, and remain constant as long as the battery is 
non-empty due to d9al i. Moreover, due to the relation in (JT3J 
and the definitions of p u and p s , given a threshold p u (t) the 
corresponding p s (t) can be uniquely found and vice versa. 
Therefore, it suffices to determine a non-decreasing p u (t) that 
only changes at times of Ebat (t) = to find the optimal policy. 

In order to find a threshold function p u (t) satisfying the 
properties above, we first make the following observation 
parallel to and 0: 

Lemma 2: There exists an optimal transmission policy that 
terminates at t = T with an empty battery, i.e., Eb at (T) = 0. 

The proof of this statement can be found in and 
as a necessary condition for optimality, and relates to r(p) 
being strictly increasing. In our current set, this requirement is 
slightly relaxed to r{p) being non-decreasing, thus, one might 
think that this termination requirement may not be necessary. 
However, Lemma |2] still holds since for any optimal trans- 
mission policy with Eb at {T) > 0, one can construct a policy 
performing at least as good by increasing the transmission 
power towards the end of transmission to deplete the battery 
at T. 

We find the threshold function p u (t) by a one-dimensional 
search: Knowing that it is non-decreasing and constant while 
Ebat > 0, the smallest threshold p u \ > and the corre- 
sponding Psi > p u i that depletes the battery at some time 
ti < T is found. These thresholds are set as p u (t) and p s (t) 
for t G [0, ii]. If t\ = T, the algorithm terminates; otherwise, 
a next threshold p ul > p u2 that depletes the battery at a 
later time t\ < t 2 < T is found. This is repeated until the 
termination condition is met. 

A sample optimal transmission policy is depicted in Fig- 
ure |2] The harvesting process hit) represents a predictable 
solar harvesting pattern inspired from the example in [17, Fig- 
ure 1]. The first set of thresholds p u \ and p s \ are determined 
as the smallest thresholds depleting the battery sometime in 
[0, T], namely ti in this figure. The second set of thresholds 
p U 2 and p S 2 are determined starting from ti as the smallest 
ones depleting the battery in [ti,T], which consequently 
coincided with the deadline T. With these threshold values, the 
transmitter power p(t) is shown in bold. Note that the lower 
threshold p u2 is not effective until the battery is charged for 
the first time after t\, since in this interval the battery is still 
empty, and the thresholds are free to change gradually while 
Ebat = 0. The energy above pit), denoted with the shaded 
regions, are stored in the battery and used up to provide the 
energy denoted with the dotted regions. 

Remark 1: The optimal policy derived in this section can 
be shown to converge to the results of when the energy 
storage is assumed to be ideal. This is when rj = 1, and any 
harvested power can be stored without a loss. The relationship 




Fig. 2. Example optimal policy with transmission power thresholds p s and 
p r and a single empty battery instance at ti 



between the two thresholds, i.e., (T5[ , becomes 

r'(Ps) 



r'{Pu) 



n 



(18) 



When a strictly concave rate function is considered, such as 
in 0, the rate function is strictly increasing. Thus the only 
solution to (fT8l is at p s = p u , i.e., when the transmitter only 
transmits with constant power p = p s = p u , and stores or 
retrieves the harvested power accordingly. Consequently, the 
optimal policy consists of constant power transmissions, with 
the power level increasing only at instances of empty battery 
so that A(f) > 0. Since the infinite battery transmission com- 
pletion time minimization problem in was shown to have an 
identical solution with the short-term throughput maximization 
problem we consider in reference 0, our solution coincides 
with that of [2| with ideal batteries. 

IV. Optimal Transmission Policy for the 
Broadcast Channel 

In this section, we extend the results of Section [HI] to 
the multi-receiver setting. For simplicity, we consider two 
receivers, although the results are easily generalizable to more 
than two. The channel model is depicted in Figure [5] For 
this setting, we wish to find an average rate region D\eh = 
iri,avg,r2 y avg) which is the union of average rate pairs that 
can be achieved under the energy harvesting constraints in 
and ©. 

At any time t, the transmitter allocates the power pit) for 
transmission, and can achieve any rate pair (ri,r2) G Dt(p(t)) 
where £H(p(f )) is the achievable rate region for transmit power 
pit). For example, for the static additive white Gaussian noise 
(AWGN) channel, the capacity region WKAWGNip), achieved 
by superposition coding, is known to be as in ( [T9l l when o\ < 
&2 fl"8l . For the AWGN channel, it is trivial that this region is 
convex for a fixed p. This property can be extended beyond this 
special case by pointing out the availability of time-sharing. If 
two points (ri, r 2 ) G 9\ and (r^, r' 2 ) G are achievable, then 
by time-sharing the two schemes, any convex combination can 
also be achieved, and thus 9t(p) is convex for a fixed p. 

A more relevant property to the energy harvesting problem 
is the concavity of in transmit power p. Specifically, if 
two rate pairs iri,r 2 ) G 9t(p) and ir[,r 2 ) G 9t(p') can be 



9<awgn(p) = \ (n,r 2 ) 



n < 2 lo g2 



^2 



, r 2 < -log 2 I 



(1 - a)p 



ap 



< a < 1 



(19) 
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Fig. 3. Energy harvesting transmitter with inefficient storage in a broadcast 
setting. 

achieved with transmit powers p and p' respectively, then their 
convex combination, i.e., (Ari + (1 — X)r[, Xr 2 + (1 — A)r 2 ) 
must be within the achievable rate region for transmit power 
Xp + (1 — X)p', where < A < 1. Similar to the convexity 
of 9\, this argument follows from time-sharing between the 
two rate pairs and corresponding powers, and applies to any 
broadcast model in which time-sharing is applicable. 

Combined with the linearity of the constraints on transmit 
power for an energy harvesting broadcaster, the observations 
above yield to the following Lemma: 

Lemma 3: The achievable average rate region 91eh — 
(ri.avg,r2,avg) for an energy harvesting transmitter under 
power constraints (0 and is convex. 

Proof: It is to be shown that for any two achievable 
average rate pairs, all of their convex combinations are 
also achievable. Let p(t) and p'(t) be two feasible power 
allocation policies achieving rate pairs (ri(£), ^(t)) and 
{r[(t) 7 r! 2 (t)), yielding the average rate pairs (ri, a „ g , r 2 , „ ff ) 
and {r[ avg ,r' 2 avg ) respectively. A convex combination of 
these average rates with parameter A can be achieved by em- 
ploying the transmit power Xp(t) + (1 — X)p'(t) and choosing 
the rates as (Xn(t) + (1-X)r[(t), Ar 2 (i) + (1-A)r 2 (i)). Note 
that these rates are achievable due to the concavity of £H(p) 
as discussed above. The feasibility of the power allocation 
follows from (01 and the constraints in (01 and (@]i being linear, 
i.e., if two sets of s(t) and u(t) satisfy these constraints, then 
so does their convex combination. ■ 

So far, we have shown that the achievable average rate 
region is convex. This is similar to the convexity of the 
maximum departure region in (6|. Consequently, one can trace 
its boundary by maximizing weighted sum rates, since each 
boundary point will be the maximizer to at least one set of 
weights. Moreover, by allocating all power to only one of 
the receivers, we can deduce that this region rests inside the 
box [0,r ljTOa J x [Q,r 2 jna X ] where r jtmax is the average rate 
achieved when all harvested power is used for transmitting to 
user j as in Section [HI] Since three of the corner points of 
this box can trivially be achieved, we can further restrict our 
attention to boundary points maximizing the weighted sum 
ari + T2 with a > 0. 

Given an instantaneous transmit power p, let the maximum 



achievable weighted sum-rate which satisfies (ri,r 2 ) £ 9t(p) 
be (p). We define the average sum rate maximization 
problem, which allows to find the boundary of 9\eh, as 
follows: 

max I f r* c (h(t) - s(t) + «(*)) dt (20a) 

s.t. < / t]s(t) — u(t) dr, 
Jo 

h(t) > s(t) > 0, u(t) > 0, < t < T. (20b) 

It turns out that the policy described in Section [Til] extends 
to this setting in a straightforward manner. To show this, we 
begin by characterizing r^ c (p). 

Lemma 4: The maximum achievable weighted sum-rate 
r a C (p) f° r an Y coefficient a > is non-decreasing, continu- 
ous and concave in p. 

Proof: The proof is similar to that of Lemma Q] The non- 
decreasing property follows from the transmitter discarding 
excess energy to achieve any rate for a lower power. By time- 
sharing between an off state and any power p, one can get 
arbitrarily close to (p) using power p — e where e > 
is arbitrarily small, showing continuity. Finally, time sharing 
between any pair of powers p\ and p 2 with parameter A 
ensures that the rate function is concave within [pi,p2]. ■ 

Comparing Lemma Q] and Lemma |4] we observe that the 
rate functions have the same properties which are sufficient 
to prove the optimality of the policy in Section [HI] The op- 
timal broadcast channel power policy therefore has a double- 
threshold structure with the increasing thresholds found by a 
search, with the achieved weighted sum-rate at time t given 
by rf c ( P (t)). 

A fair question regarding this setting, and in fact for all 
multi-user models with more than one rate in the objective, 
is how to choose the individual rates for the users given the 
power of the broadcasting node. In a broadcast channel, the 
achievable rate tuples for a broadcast power is given by an 
achievable region. Due to the linear structure of the objective 
function, the optimal choice of the rate tuple arises as the one 
maximizing the weighted sum-rate for the given instantaneous 
power, found on the boundary of the achievable region having 
the weight ratio a as a subgradient. 

An interesting outcome is that the double threshold structure 
of the optimal policy is valid for any weight ratio a. That said, 
a is a critical parameter of the system since the threshold 
values relate to a through r^ c (p) and ( fT3l i: with the time- 
sharing coding scheme that achieves a particular point on 
r a C (p) a l so depending on the structure of the weighted sum- 
rate. 

Remark 2: The results of this section can also be shown to 
converge to previous results on the broadcast channel in (6) 
when storage efficiency 77 = 1. In this case, the thresholds are 
once again found to be equal as in Remark Q] and total power 
levels constant and nondecreasing throughout the transmission 
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are therefore found to be optimal, consistent with (6] Lemma 
3] for an energy harvesting broadcast node with ideal and 
infinite energy storage. 

V. Extension to Finite Battery Models 

In practice, it is likely that the storage device is of finite 
capacity, or it might be beneficial for design purposes to have 
as small a storage device as possible. Thus, it is relevant to 
consider the single user problem in (0 by including the battery 
capacity constraint in ®, The problem thus becomes 

1 f' 

(21a) 



max — / r(h(t) - s(t) + u(t)) dt 

s(t),u(t) T J 

s.t. < / 7]s(t) - u(t) dr < E max , 
Jo 

h(t) > s(t) > 0, u(t) > 0, < t < T (21b) 

With the added energy storage limitation of E max , the 
Lagrangian of (fJTJ becomes ( 1221 where j3(t) is the non- 
negative Lagrangian multiplier for the new constraint, with 
the corresponding complementary slackness condition 

ri 



t)s(t) — u(t) dr — E n 



= < t < T 



(23) 

in addition to the ones listed in (0. Substituting this modified 
Lagrangian in Cases 1 to 5 of Section [TTTJ we observe that 
the threshold values are still related with ([TBI ), and can be 
expressed as 

r'(Pu(t)) = £ A(r) dr - J* (i(r) dr (24) 
r'{Ps{t)) = V [ l T A(r) dr - f (3(t) dr) . (25) 



In the infinite battery case, non-negativity of X(t) implied 
non-decreasing threshold powers, and the complementary 
slackness condition in d9ab implied that the thresholds could 
only increase when the battery was empty. With the finite 
battery constraint, due to the added f3(t) terms in d24T i and ( |25"1 >. 
this statement is revised as follows: The thresholds can only 
increase when the battery is empty, Ef, a t = 0, and can only 
decrease when the battery is full, E^ = E rnax . Similar to its 
counterpart, the second statement follows from the condition 
in d23l . where either (3(t) or energy stored in the battery at 
time t has to be zero at any given t. 

What remains is to find an algorithm that gives the optimum 
threshold levels for p u or p s satisfying the above conditions. 
An optimal policy must follow the restrictions above while 
ensuring that the transmission terminates with Eb at (T) = 
to avoid suboptimality due to energy loss. This statement 
provides a sufficient decision metric to find the optimal 
threshold levels. Consider that for some epoch [fo,£i], the 
thresholds p s and p u is a candidate pair. For these thresholds 
to increase or decrease at t\ and yield the next thresholds, 
the battery must be empty or full respectively. Assume first 
that Eb a t(ti) = 0, indicating that the thresholds will increase 
and thus less energy will be stored in the next epoch than 
what would have been stored if the same thresholds were to 



extend beyond t\. Therefore, looking at the next battery event 
if p s and p u extended beyond t\ gives important information 
about the possible threshold changes in the future. If this is 
another empty battery event, storing less energy with the next 
thresholds would yield an empty battery even earlier, and the 
node would be forced to transmit suboptimally. Conversely, 
assume that Et, at (ti) = E max , and next thresholds are thus 
less than the candidate thresholds, storing relatively more en- 
ergy after t\. If the next battery event for candidate thresholds 
is another full battery event, storing more energy would cause 
energy overflow, which is suboptimal. If, on the other hand, 
the candidate thresholds can transmit feasibly until deadline 
T, an empty battery event does not come up until T, and an 
empty battery at T is not possible with an ever decreasing set 
of thresholds. With these cases ruled out, the decision for an 
optimal candidate pair can be summarized as follows: 

Lemma 5: The lowest threshold yielding an empty battery 
at some time t\ < T is optimal only if the same threshold, 
when applied past t\, yields a full battery at some t 2 such that 
T > t 2 > t%, or does not yield a battery event until the end 
of transmission. Conversely, the highest threshold yielding a 
full battery at some time t\ < T is optimal only if the same 
threshold, when applied past i\, yields an empty battery at 
some t% such that T > i 2 > t±. 

With the restrictions in Lemma [5] the optimal threshold 
function p u (t) and the corresponding p s (t) can be determined 
by a search algorithm. First, the smallest and largest candidates 
depleting or filling the battery at some time t\ and t\ are found. 
Out of the two candidates, the one satisfying the relevant 
condition in Lemma is chosen to as the optimal thresholds. 
The procedure is then repeated for the next set of thresholds 
until a feasible set of thresholds depleting the battery at t = T 
is found, at which point the algorithm terminates. 

It is necessary for completeness to point out that for all 
possible realizations of the candidate thresholds, there exists 
only and exactly one candidate that satisfies the criteria in 
Lemma This ensures that the proposed algorithm yields a 
unique policy, and that it can always find one. 

Lemma 6: For any pair of candidate thresholds found as 
the minimum and maximum battery depleting and filling 
thresholds, there exists exactly one candidate satisfying the 
corresponding criteria in Lemma 

Proof: Let the two distinct candidates for the lower 
threshold be pi " 1 and py ma:c ', yielding an empty battery and 
a full battery at some t\ and i\ respectively. Clearly, p s ^ > 

( E ) 

p s , since otherwise one of the thresholds violate energy 
availability or battery capacity constraint at min(t\,i\). As- 
sume that both candidates satisfy the conditions of Lemma 
the threshold pi ^ fills the battery at some t% < T or extends 
feasibly to T, and the threshold p\ m depletes the battery at 
some <2 < T by construction. If I2 <t\, the first candidate is 
not feasible since it must deplete the battery before t% due to 
p^ > p{. Ema!C \ Else if ti < t 2 < t 2 , the first candidate must 
deplete the battery again before t 2 , and thus cannot satisfy the 
conditions of the Lemma. Lastly if t 2 > t 2 , then the second 
candidate yields a battery overflow at t 2 and thus is not a 
feasible candidate. Thus, both candidates cannot satisfy the 
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£ = [ r{h{t) - s(t) + u(t)) dt+ I A(i) f t]s{t) - u(t) dr dt - [ (3{t) ( f V s{t) - u{t) dr - E max ) dt- 
Jo Jo Jo Jo \Jo / 

(J,(t)(h(t) - s(t)) dt + cr(t)s(t)dt+ v(t)u(t) dt 
io Jo Jo 



(22) 



conditions of Lemma 

Conversely assume that pi does not satisfy the conditions, 
i.e., any feasible candidate depletes the battery again 
at some ti. Then, starting from pi°^ and decreasing this 
threshold, one can always find a value at which the battery is 
full at some t\ and is depleted at some ti > t-z. Choosing this 



value as p 
conditions 



(E„ 



, the second threshold satisfies the required 



Finally, assume that p s 



does not satisfy the conditions, 



i.e., any feasible candidate p) s max> cannot deplete the battery 
at any future instance i% > i\. Then, any threshold larger than 

( E ) 

pi ma *' cannot yield a full battery at any time t < T, and 
therefore the smallest such threshold depleting the battery at 
some t\ must extend to T, making it a feasible candidate for 
Ps . In short, the two candidates cannot satisfy the conditions 
of Lemma [5] simultaneously, and the failure of either candidate 
implies the success of the other candidate, proving that there 
is exactly one candidate that is optimal. ■ 

Since the problems are mathematically identical, this solu- 
tion also extends to the broadcast setting. 

Remark 3: Similar to Remarks [T] and [2] the policy for 
the finite battery case derived in this section also coincides 
with previous results for the ideal battery case studied in Q. 
When 7] = 1 and p(t) is strictly concave, the thresholds are 
equal and thus the optimal policy is a constant power policy. 
The power levels increase and decrease at empty and full 
storage instances, and are chosen analogously to the criterion 
in Lemma in [3, Theorem 1]. 

VI. Online Transmission Policies 

In the previous sections, the optimal policy is found by 
analyzing the harvesting process h(t) over the entire trans- 
mission duration [0,T]. Therefore it is necessary to know the 
realization of the energy harvesting non-causally in order to 
determine the optimal transmission powers, i.e., the policy 
is found in an offline manner. This approach provides us a 
benchmark solution as well as insights for efficient power 
allocation, besides being applicable in some special cases 
where the harvested energy is highly predictable or controlled. 
However, such knowledge may not be available in all energy 
harvesting applications. In this section, we develop online poli- 
cies that only requires the distribution and causal harvesting 
information based on the double-threshold structure of the 
offline policies of earlier sections. 

A. Optimal Online Policy 

Without non-causal information of the harvested energy, the 
transmitter needs to determine its action based only on the 
current states as well as the previous realizations of the arrival 



process. The best such policy can be found using dynamic pro- 
gramming |[T9l . This approach realizes a recursive definition 
of the value function, with the desired action being the value 
maximizer, and solves for the optimal action iteratively. 

For an energy harvesting transmitter, the states at time t 
are the energy stored in the battery, Eb at {t), causal harvested 
energy information h* = h(r), < r < t, and the 
time to deadline, T — t. Based on these states, the node 
decides on its action, i.e., transmit power, through the func- 
tion <f>(Ebat(t), h* ,T — t). The value function, i.e., expected 
throughput of the system starting from the given state, is 
expressed as 

i-T 

r(<f>(E bat (T),h T ,T- T ))dT 



V{E bat ,h\T-t) = maxl 



(26) J 

The optimal action </>(.) is then chosen as the argument that 
maximizes above value function. 

To solve this problem iteratively, we need to express this 
value function as a recursive relation. Thus, we approximate 
the integral in (|26| | as a Riemann sum with interval length of 
5. Since the contribution of the next interval is determined by 
the immediate action 4>{E ba t{t)i h*, t), the value function is 
given by the Bellman equation in (|27| >. 

Taking the expectation over the distribution of the harvesting 
process, this equation can be solved iteratively. However, it is 
possible to further decrease the dimension of the problem to 
make it more tractable. For example, if the arrival process 
is Markovian or i.i.d., the past states do not provide any 
additional information about the process. Thus, the state h* 
can be replaced by only the current harvesting rate h(t). The 
time until the end of transmission, T — t, is helpful towards the 
very end of the transmission when the node desires to fully 
consume its energy. For sufficiently large T, or for an infinite 
deadline, this state can be ignored, significantly reducing 
computational load. In such cases, a discount factor of f3 is 
added to stabilize the value function. With these assumptions, 
the Bellman equation in ( f27T > reduces to d28l i. Notice that the 
battery state in the value function on the right hand side of (l28l i 
can be found from the states at t. The battery state changes 
linearly with <fr, with slope —r\ when <f>(E, h) < h(t) and slope 
— 1 when <j)(E,h) > h{t). This infliction in the state hints to 
4>(E, h) = h(t), i.e., transmitting with harvested power, being 
an optimal action in some cases. 

A sample solution to the dynamic program in (l28l l is given 
in Figure |4] The figure shows the optimal action <f>(E, h) in 
an AWGN channel when the harvested power h within an 
interval of length 5 is distributed independently and uniformly 
in [0, 20]mW, with a battery capacity of E max = lOOmJ. 
It can be observed that the optimal transmit power is equal 
to the harvested energy for a range of states, marked by I, 
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V{E bat ,h\T -t) =maxE 



r{0(E bat ,h t ,T-t))8+ I r{<P(E bat (T),h T ,T-T))d7 

t+s 



= max rMEbat, h* t T - t))5 + E [V{E bat {t + S), h t+s , T — t — 5)] 

<f> 

V(E bat , h) = max r(<f>{E bat , h))5 + (3E [V(E bat (t + 5), h(t + 5))} . 



(27) 
(28) 




Fig. 4. Optimal online transmission power as a function of node states. 

above and below which the transmit powers remain constant 
(II and III respectively). Moreover, for a fixed stored energy, 
the two thresholds can be shown to satisfy ( fT5l ). This aligns 
perfectly with the optimal offline two-threshold policy, with 
the thresholds adapting to stored energy rather than non- 
causally known energy harvests. 

B. Proposed Online Policy 

The main insight of the offline policies is that the double 
threshold structure in Section [Til] and [V] applies at any time, 
but the thresholds vary based on the harvesting process. 
However, in the absence of future harvesting information, it is 
computationally difficult to estimate the optimum thresholds 
without solving the dynamic program in Section IVI-AI That 
said, the threshold structure with the relation in ( fT5l ) can still be 
utilized with thresholds set independently of future harvesting 
values. With this in mind, we now propose a simple online 
policy as follows. 

Assuming that the distribution of the harvesting process is 
known as fh{p), we propose finding fixed thresholds p s {t) = 
p s and p u (t) = p u that simultaneously satisfy 

roc f Pu // \ 

/ h( P )dp - / f h ( P )dp = 0, -pi = tj. (29) 
J p. Jo r'(p u ) 

The first equation in d29l provides long term energy stability 
by ensuring that the expected energy stored in and drawn from 
the battery are equal, and thus neither the energy storage is 
underutilized, nor an excessive amount of energy is stored 



without utility. Note that as r) — >• 1, this reduces to a constant 
power policy that preserves energy-neutrality, and resembles 
the best-effort transmission scheme of 1201 which is optimal in 
the information theoretic sense for infinite length transmission. 
On the other hand at 77 = 0, d29l ) is only satisfied with p u = 
and p s —> 00. This means that transmit power is supplied 
directly by the harvested power, p(t) = h(t), which is optimal 
since at this efficiency, energy storage is useless. Thus, the 
proposed online policy achieves the capacity for these two 
extreme values of 77 in the asymptotical case. 

VII. Numerical Results 

To demonstrate the performance of the offline optimal 
policy and the online policies, in this section we present the 
numerical results from simulations. Being a more realistic 
model, we focus on the finite battery case, noting that the 
resulting insights are similar for the infinite-battery counter- 
part. 

We first focus on a single receiver setting. We consider 
an energy harvesting transmitter node equipped with a bat- 
tery of capacity 100m J. We assume that the communication 
channel has Gaussian noise with noise spectral density TVq. = 
10~ 19 W/Hz at the receiver, and a bandwidth of 1MHz. The 
path loss between the transmitter and receiver is — lQOdB. 
The transmit duration is taken to be T = 10000 seconds. For 
practical purposes, the continuous model is approximated via 
sampling at 100 samples per second. The harvesting process 
h(t) at each sample point is generated in an i.i.d. fashion, 
distributed uniformly in [0, 40] mW. 

Figure [5] shows the average rates achieved with the optimal 
offline policy of Section [V] and the online policies of Sec- 
tion [VI] in comparison with two alternative naive algorithms 
as a function of storage efficiency r\. The hasty algorithm 
uses up the energy as it is harvested, i.e., p(t) — h(t), and 
its performance is therefore independent of storage efficiency. 
This algorithm performs relatively well for small values of rj as 
expected, but is surpassed by the others as storage becomes 
a feasible option. The constant algorithm targets a constant 
transmission level p c equal to the average harvesting rate. 
Although optimal for an infinite and efficient battery in the 
asymptotical case, this algorithm relies significantly on energy 
storage and therefore fails for smaller values of 77. 

As seen in the figure, an efficient battery provides a 
significant performance advantage in all cases except the 
hasty policy. The hasty algorithm is optimal at 77 = 0, 
while the constant power policy approaches optimal online 
with increasing storage efficiency. The proposed online policy 
performs at least as good as both the hasty and constant power 
algorithms for all values of 77 by mimicking both in the two 
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Fig. 5. Average transmission rates versus battery efficiency for the optimal 
and proposed online algorithm in comparison to naive online algorithms. 
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Average transmission rate regions in an energy harvesting broadcast 
with r) = 0.5. 



extreme cases. In this plot, as well as in extensive simulations 
not presented here, it is observed that the proposed online 
algorithm performs significantly well in comparison with the 
online optimal policy, while both remain notably close to 
the optimal offline upper bound in the absence of non-causal 
harvesting information. 

To observe how these results reflect to the broadcast chan- 
nel, we perform a simulation with identical parameters and 
two receivers while fixing the efficiency as rj = 0.6 and plot 
the achievable average rate region for the two user Gaussian 
broadcast setting in Figure [6] As described in Section IIVI this 
region is determined by tracing its boundary using a range 
of values for the ratio a. For each value of a, the maximum 
weighted sum is calculated, yielding the average rates R\ and 
R 2 on the boundary with tangent a. In comparison to the rates 
achieved with the naive algorithms, the achievable regions for 
online and offline policies are shown in Figure [6] when the path 
loss for the two users are — lOOdB and — l03dB respectively. 



It is observed that the optimal offline policy allows a larger 
rate region to be achieved compared to the naive algorithms, 
while the proposed online algorithm achieves very close to the 
optimal online boundary and fairly close to the optimal offline 
boundary. 

VIII. Conclusion 

In this paper, the optimal transmit power policy for an 
energy harvesting transmitter with an inefficient energy storage 
device was identified. For an infinite battery, it was shown that 
the optimal policy has a double-threshold structure, where the 
thresholds are related and are a function of the harvesting 
process and storage efficiency. The thresholds were shown to 
be non-decreasing with specific properties that allow them to 
be found using a simple search algorithm. Using the single 
user policy, in the broadcast setting the weighted sum rate 
maximizing policy was shown to have an identical structure. 
The results were then extended to the case with a finite storage 
capacity. It was observed that while differing significantly 
when battery is inefficient, the optimal transmission policies 
proposed in this paper converges to previous results as effi- 
ciency goes to 1. Additionally, the optimal online policy was 
found using dynamic programming, and was shown to have 
the two-threshold structure with the thresholds adapting to 
battery state. Based on the insights from these results, a fixed- 
threshold online policy was proposed and shown to perform 
notably well in a single user setting with finite battery capacity 
compared to other naive power allocation algorithms, while 
closely tracking the optimal online policy. 

An interesting insight of this study is that when battery 
inefficiency is considered, the optimal power policy is no 
longer piecewise constant as was the case in previous work 
with ideal batteries. In fact, in between the two thresholds, 
the optimal transmitter power turns out to be equal to the 
harvested power, dictating using up the harvested energy 
without storing. A relevant future direction is thus developing 
efficient and practical coding schemes for a transmission with 
a varying power constraint unknown to the receiver. Another 
topic of interest is the extension to multiple energy harvesting 
transmitter scenarios. 
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